Learning Expressive Computational Models of Gene Regulatory Sequences and Responses

نویسنده

  • Keith Noto
چکیده

The regulation and responses of genes involve complex systems of relationships between genes, proteins, DNA, and a host of other molecules that are involved in every aspect of cellular activity. I present algorithms that learn expressive computational models of cis-regulatory modules (CRMs) and gene-regulatory networks. These models are expressive because they are able to represent key aspects of interest to biologists, often involving unobserved underlying phenomena. The algorithms presented in this thesis are designed specifically to learn in these expressive model spaces. I have developed a learning approach based on models of CRMs that represent not only the standard set of transcription factor binding sites, but also logical and spatial relationships between them. I show that my expressive models learn more accurate representations of CRMs in genomic data sets than current state-of-the-art learners and several less expressive baseline models. I have developed a probabilistic version of these CRM models which is closely related to hidden Markov models. I show how these models can perform inference and learn parameters efficiently when processing long promoter sequences, and that these expressive probabilistic models are also more accurate than several baselines. Another contribution presented in this thesis is the development of a general-purpose regression learner for sequential data. This approach is used to discover mappings from sequence features in DNA (e.g. transcription or sigma factor binding sites) to real-valued responses (e.g. transcription rates). The key contribution of this approach is its ability to use the real values directly to discover

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Models of Gene Regulatory Sequences and Responses

The regulation and responses of genes involve complex systems of relationships between genes, proteins, DNA, and a host of other molecules that are involved in every aspect of cellular activity. I present algorithms that learn expressive computational models of cis-regulatory modules (CRMs) and gene-regulatory networks. These models are expressive because they are able to represent key aspects ...

متن کامل

In silico screening of G-Quadruplex Structures in Wilms tumor 1 Gene Promoter

Introduction: X-ray diffraction studies have revealed that guanines in a DNA stands may be arranged in quartet and form a structure called G-quadruplexs. Bioinformatics studies suggested the formation of G-quadruplex structure in human crucial genes, including Wilms tumor 1 (WT1). The aim of this study was to in silico analysis of the guanine-rich sequence in the promoter region of the WT1 gene...

متن کامل

RNA-Seq Bayesian Network Exploration of Immune System in Bovine

Background: The stress is one of main factors effects on production system. Several factors (both genetic and environmental elements) regulate immune response to stress. Objectives: In order to determine the major immune system regulatory genes underlying stress responses, a learning Bayesian network approach for those regulatory genes was applied to RNA-...

متن کامل

CTLA4 Gene Variants in Autoimmunity and Cancer: a Comparative Review

Gene association studies are less appealing in cancer compared to autoimmune diseases. Complexity, heterogeneity, variation in histological types, age at onset, short survival, and acute versus chronic conditions are cancer related factors which are different from an organ specific autoimmune disease, such as Grave’s disease, on which a large body of multicentre data is accumulated. For years t...

متن کامل

Molecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds

The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007